~ chicken-core (chicken-5) /manual/Data representation
Trap1[[toc:]]
2[[tags: manual]]
3
4== Data representation
5
6There exist two different kinds of data objects in the CHICKEN system:
7immediate and non-immediate objects.
8
9=== Immediate objects
10
11Immediate objects are represented by a single machine word, 32 or 64 bits depending on the architecture. They come in four different flavors:
12
13'''fixnums''', that is, small exact integers, where the lowest order bit is
14set to 1. This gives fixnums a range of 31 bits for the actual
15numeric value (63 bits on 64-bit architectures).
16
17'''characters''', where the four lowest-order bits are equal to
18{{C_CHARACTER_BITS}}, currently 1010. The Unicode code point
19of the character is encoded in the next 24 bits.
20
21'''booleans''', where the four lowest-order bits are equal to {{C_BOOLEAN_BITS}},
22currently 0110. The next bit is one for #t and zero for #f.
23
24'''other values''': the empty list, the value of unbound identifiers,
25the undefined value (void), and end-of-file. The four lowest-order bits are equal to
26{{C_SPECIAL_BITS}}, currently 1110. The next four bits contain an identifying
27number for this type of object, one of:
28{{C_SCHEME_END_OF_LIST}}, currently 0000;
29{{C_SCHEME_UNDEFINED}}, currently 0001;
30{{C_SCHEME_UNBOUND}}, currently 0010; or
31{{C_SCHEME_END_OF_FILE}}, currently 0011.
32
33=== Non-immediate objects
34
35Collectively, the two lowest-order bits are known as the ''immediate mark bits''. When the lowest bit is set, the object is a fixnum, as described above, and the next bit is part of its value. When the lowest bit is clear but the next bit is set, it is an immediate object other than a fixnum. If neither bit is set, the object is non-immediate, as described below.
36
37Non-immediate objects are blocks of data represented by a pointer into
38the heap. The pointer's immediate mark bits must be zero to indicate the object is non-immediate;
39this guarantees the data block is aligned on a 4-byte boundary, at minimum. Alignment of data words
40is required on modern architectures anyway, so we get the ability to distinguish between immediate and non-immediate objects for free.
41
42The first word of the data block contains a header, which gives
43information about the type of the object. The header is a
44single machine word.
45
46The 24 (56 on 64-bit systems) lowest-order bits contain the length of
47the data object, which is either the number of bytes in a string or
48byte-vector, or the number of elements for a vector or record type. This
49allows a maximum size for string or byte-vectors of 2^24 bytes, or
50approximately 16 MB, on 32-bit systems, and 2^56 bytes, or approximately
5172 PB, on 64-bit systems.
52
53The remaining bits are placed in the high-order end of the header.
54The four highest-order bits are used for garbage
55collection or internal data type dispatching.
56
57; C_GC_FORWARDING_BIT : Flag used for forwarding garbage collected object pointers.
58
59; C_BYTEBLOCK_BIT : Flag that specifies whether this data object contains raw bytes (a string or blob) or pointers to other data objects.
60
61; C_SPECIALBLOCK_BIT : Flag that specifies whether this object contains a ''special'' non-object pointer value in its first slot. An example for this kind of objects are closures, which are a vector-type object with the code-pointer as the first item. This is also used to turn a pair's car into a weak reference in the symbol table, to allow its symbol to be collected.
62
63; C_8ALIGN_BIT : Flag that specifies whether the data area of this block should be aligned on an 8-byte boundary (floating-point numbers, for example).
64
65After these four bits comes a 4-bit type code representing one of the following types:
66
67'''vectors''': vector objects with type bits {{C_VECTOR_TYPE}}, currently 0000.
68
69'''symbols''': vector objects with type bits {{C_SYMBOL_TYPE}},
70currently 0001. The three slots contain the toplevel variable value,
71the print-name (a string), and the property list of the symbol. When
72manipulating these slots, the symbol table's container needs to be
73manipulated as well, to control garbage collection of the symbol: if
74the symbol is undefined and has no property list, the symbol table's
75container should be a weak reference ({{C_WEAK_PAIR_TYPE}}), otherwise
76it should be a strong reference ({{C_PAIR_TYPE}}).
77
78'''strings''': byte-vector objects with type bits {{C_STRING_TYPE}}, currently 0010.
79
80'''pairs''': vector-like object with type bits {{C_PAIR_TYPE}}, currently 0011.
81The car and the cdr are contained in the first and second slots,
82respectively.
83
84'''closures''': special vector objects with type bits
85{{C_CLOSURE_TYPE}}, currently 0100. The first slot contains a pointer to a
86compiled C function. Any extra slots contain the free variables (since
87a flat closure representation is used).
88
89'''flonums''': byte-vector objects with type bits
90{{C_FLONUM_BITS}}, currently 0101. Slots one and two (or a single slot on
9164 bit architectures) contain a 64-bit floating-point number, in the
92representation used by the host system's C compiler.
93
94'''bignums''': special vector objects with type bits
95{{C_BIGNUM_TYPE}}, currently 0110. This contains only one slot, which
96points to a bytevector that contains the number's limbs in a special
97format: The first word contains a 1 if the number is negative, 0 if it
98is positive. The remaining words form the bignum's limbs. A
99bytevector is used because the limbs are stored in the raw machine
100format, which would be invalid Scheme objects when viewed as slots in
101a vector.
102
103'''ports''': special vector objects with type bits
104{{C_PORT_TYPE}}, currently 0111. The first slot contains a pointer to a file-
105stream, if this is a file-pointer, or NULL if not. The other slots
106contain housekeeping data used for this port.
107
108'''structures''': vector objects with type bits
109{{C_STRUCTURE_TYPE}}, currently 1000. The first slot contains a symbol that
110specifies the kind of structure this record is an instance of. The other
111slots contain the actual record items.
112
113'''blob''': a raw sequence of bytes with type bits {{C_BYTEVECTOR_TYPE}}.
114
115'''pointer-vectors''': vector objects of native pointers - these are
116actually structures where the first slot holds a blob containing the 32- or 64-bit
117pointer values.
118
119'''locatives''': special vector objects with type bits
120{{C_LOCATIVE_TYPE}}, currently 1010. A locative object holds 4 slots:
121a raw pointer to the location inside the object referred to by the locative,
122the offset in bytes from the start of the object referred to, the type of
123the location (whether it refers to an unboxed numeric value or a normal
124object slot that holds a pointer to Scheme data) and a flag indicating
125whether this locative is "weak". If the locative is non-weak, slot #4 holds
126a pointer to the object referred to.
127
128'''pointers''': special vector objects with type bits
129{{C_POINTER_TYPE}}, currently 1001. The single slot contains a machine pointer.
130
131'''tagged pointers''': special vector objects with type bits
132{{C_TAGGED_POINTER_TYPE}}, currently 1011, Tagged pointers are similar to pointers,
133but the object contains an additional
134slot with a tag (an arbitrary data object) that identifies the type
135of the pointer.
136
137'''ratnums''': vector-like objects with type-bits {{C_RATNUM_TYPE}},
138currently 1100. The first slot contains the numerator (which can be
139positive or negative), the second slot contains the denominator, which
140is always positive. These numbers are always simplified, so their gcd
141will always be 1.
142
143'''lambda infos''': byte-vector objects with type-bits {{C_LAMBDA_INFO_TYPE}}, currently 1101.
144
145'''cplxnums''': vector-like objects with type-bits {{C_CPLXNUM_TYPE}},
146currently 1110. The first slot contains the real part, the second slot
147contains the imaginary part of the complex number. These two numbers
148are of matching exactness: Either both are flonums or none are.
149
150The actual data follows immediately after the header. Note that
151block addresses are always aligned to the native machine-word
152boundary.
153
154Data objects may be allocated outside of the garbage collected heap, as
155long as their layout follows the above mentioned scheme. But care has to
156be taken not to mutate these objects with heap-data (i.e. non-immediate
157objects), because this will confuse the garbage collector.
158
159For more information see the header file {{chicken.h}}.
160
161
162---
163Previous: [[C interface]]
164
165Next: [[Modules]]